{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU",
    "gpuClass": "standard"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Introducing the Semantic Graph\n",
        "\n",
        "One of the main use cases of txtai is semantic search over a corpus of data. Semantic search provides an understanding of natural language and identifies results that have the same meaning, not necessarily the same keywords. Within an Embeddings instance sits a wealth of implied knowledge and relationships between rows. Many approximate nearest neighbor (ANN) indexes are even backed by graphs. What if we are able to tap into this knowledge?\n",
        "\n",
        "Semantic graphs, also known as knowledge graphs or semantic networks, build a graph network with semantic relationships connecting the nodes. In txtai, they can take advantage of the relationships inherently learned within an embeddings index. This opens exciting possibilities for exploring relationships, such as topics and interconnections in a dataset. \n",
        "\n",
        "This notebook introduces the semantic graph.\n",
        "\n"
      ],
      "metadata": {
        "id": "tcpqzMwjvdN2"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Install dependencies\n",
        "\n",
        "Install `txtai` and all dependencies. We'll install the graph extra for graph functionality, pipeline extra for object detection and similarity extra to load models with the sentence-transformers library."
      ],
      "metadata": {
        "id": "wGvazIzFTCdt"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "X4Gi8UIErK44"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "!pip install git+https://github.com/neuml/txtai#egg=txtai[graph,pipeline,similarity] datasets ipyplot"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Graph basics\n",
        "\n",
        "First we'll build a basic [graph](https://en.wikipedia.org/wiki/Graph_theory) and show how it can be used to explore relationships.\n",
        "\n",
        "The code below builds a graph of animals and relationships between them. We'll add nodes and relationships along with running a couple analysis functions."
      ],
      "metadata": {
        "id": "vEdSkQaKOD-C"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import networkx as nx\n",
        "\n",
        "from txtai.graph import GraphFactory\n",
        "\n",
        "# Create graph\n",
        "graph = GraphFactory.create({\"backend\": \"networkx\"})\n",
        "graph.initialize()\n",
        "\n",
        "# Add nodes\n",
        "nodes = [(0, \"dog\"), (1, \"fox\"), (2, \"wolf\"), (3, \"zebra\"), (4, \"horse\")]\n",
        "labels = {uid:text for uid, text in nodes}\n",
        "for uid, text in nodes:\n",
        "  graph.addnode(uid, text=text)\n",
        "\n",
        "# Add relationships\n",
        "edges = [(0, 1, 1), (0, 2, 1), (1, 2, 1), (2, 3, 0.25), (3, 4, 1)]\n",
        "for source, target, weight in edges:\n",
        "  graph.addedge(source, target, weight=weight)\n",
        "\n",
        "# Print centrality and path between 0 and 4\n",
        "print(\"Centrality:\", {labels[k]:v for k, v in graph.centrality().items()})\n",
        "print(\"Path (dog->horse):\", \" -> \".join([labels[uid] for uid in graph.showpath(0, 4)]))\n",
        "\n",
        "# Visualize graph\n",
        "nx.draw(graph.backend, nx.shell_layout(graph.backend), labels=labels, with_labels=True,\n",
        "        node_size=2000, node_color=\"#03a9f4\", edge_color=\"#cfcfcf\", font_color=\"#fff\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 355
        },
        "id": "yfi007XD4Oa4",
        "outputId": "722e9c39-46e1-4432-98b7-6bd1a8d7dac1"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Centrality: {'wolf': 0.75, 'dog': 0.5, 'fox': 0.5, 'zebra': 0.5, 'horse': 0.25}\n",
            "Path (dog->horse): dog -> wolf -> zebra -> horse\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<Figure size 432x288 with 1 Axes>"
            ],
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAb4AAAEuCAYAAADx63eqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dfXxdd2Hn+c95uOfec/VgyZFsS45tSbblBkhioHST4jhpTFkcILR0YSa8+iIh6e5rt2x369BdYPtihu10FnZ2CEOHmdc8FBwyLaFTGDph66SAU1AMoXRTbCAkkWPJsh3LsmRLtqT7eO757R83V9aDHyT7Ppyr+32/XnKwrIeDrXu/9/c7v+/vZxljDCIiIg3CrvUFiIiIVJOCT0REGoqCT0REGoqCT0REGoqCT0REGoqCT0REGoqCT0REGoqCT0REGopb6wsQuVGhMRybMQxOh2QKkAvBsyHhQH+LzdZmC9uyan2ZIhIRCj6pO6ExHBwLOTAacGg85KWLIY4Frg3GQEhxKsOyIAihYOCWVptdnTb3dbnsWW8rCEUamKUty6ReTOYM+4fyfH4wYDowzAawkh9eC2hyocW12Nfv8pG+GO2eAlCk0Sj4JPJSgeETR3J8eTjAtiBVuPGvmXQgNPBwr8tnb/dIugpAkUah4JNIe268wIeezzKVN6TLEHiL+Q60xSyevDPOrk6n/N9ARCJHwSeRlC0YHj2c44njQUUCbzHfgQ/3uDy20yPuaPQnspop+CRyZvKGdw1kODIVViX0SnwHdrbZPL07QXNM4SeyWin4JFJm8obdz6YZnDZkwup//4QN/S0WA/f6Cj+RVUoFdomMbKE40qtV6AFkQhicNuwdyJAt6DWhyGqk4JPIePRwjiNTYc1CryQTwuGpkI8dztX2QkSkIhR8EgnPjReqtpBlOdIF+MrxgEPjEbkgESkb3eOTmksFhh0H0oxmovej2JWweOU+Xz0/kVVEIz6puY8fyTGVj17oAUzlDZ/8qaY8RVYTjfikpiZzhk1PpWp+X+9qEjacvD+p7c1EVgmN+KSm9g/lsSOeJ7YFjw/na30ZIlImGvFJzYTGsOVb0by3t1hXwmLkvb5OdRBZBTTik5o5OBYyHUQ/9ACmA8OzZyM8Hysiy6bgk5o5MBowGyx9/6vv9tmzLlo/mrMBPH36MhcrInUnWs8u0lAOjYcrOk+vlgzw3IRGfCKrgYJPaiI0hpcuVi5IKnHAwi8uhuiWuEj9U/BJTRybMVcNp9vbbf7hnT7nfiPJV++IE3/9J/WRPpeX9/qcfV+Sb749Tlfi0hcJPtjE/7TN5aW9Pi/v9QH43E6P0/cnOf+bSX7yTp83thY/3rPhX9zuMfRun9fuT/Jv3uqRuMZxfPbr1y0i9U3BJzUxOB3iXuWn7wM3u7x7IMO2AylubbN5sNfl19bZ/PNbPR54PsvN30oxkjJ89c74gs97X7fLrx5Mc+vfpHnneoddHTa3PJ1i7TdTPPB8hnOvd9E/c5vH9maLt34nzY4DKTb6Fp96Q+yq1+zaMDij6U6Reqfgk5rIFOBqs4b/+mie0YxhMgd/fTrg9jabBza7PD6c5ydTIbkQ/vBnOe64yWZL8tKo7/9+Ocdkrvj188bQErP4pRYbC3h52nDm9erE7/S5fOxw8WNnAvjsS3k+uMm9+kWb4tcVkfp2jUe6SGXkQrja2GlsXrcvVYAu3+ImD34ydemzZgM4lzVs9C1GUsWPP5m69Hl/ezbk376a50/e4rGlyeabpwL+9yM5Eg40uRY//nV/7mMtrn1f0ABZBZ9I3VPwSU149sqnG06nzYLRXdKBm+IWr6Uvhd3iQeQXjwZ88WhAZxy+dmeCP/ilGJ/+eZ5UYLjtb9KcTi//np0FxK9xH1BEok9TnVITCQdWugnKX5wIeLAnxu1tNp4Nf3yrx4/PhXOjvcV+ud3mV9bauFZxdJgpGEJTDMc/HQr43E6PztdvEXb7Fu9cf41Us7jmAhgRiT6N+KQm+ltsghWuEzl4NuSfvpjjP/9qnPaYxfPnCnzoR9krfnxrDP7lzjh9TRaZEL59psC/fKW45+Ynf5rjU2+M8YM9Ph2vjxr//bE83x678vcPQuhv1mtFkXqnvTqlJkJjaP1GtE9lWCxuhfzdLSdIJn2SyeTcm+vq9aNIPdEjVmrCtixuabUXLFaJujeucdi+fRupVIpUKsXY2BjpdBrXdRcEoe/7OI7mREWiSsEnNbOr0+bwVH1sW2YBd3U6xONx4vE47e3tABhjyGazc2E4NTVFJpPB8zx8318QhrataVKRKFDwSc3c1+Wyfzhgpg72fm5yYW/30oeLZVkkEgkSiQRr164FimGYTqdJpVKk02nOnz9PJpMhkUgsCELf97F0zJFI1eken9RMPZ3H152wOH4D5/GFYTgXhqW3fD6/IAyTySTxeFxhKFJhGvFJzdiWxb5+l0+/mCcV4WJ40oF9O9wbOoTWtm2amppoamqae1+hUJgbFV68eJEzZ84QBMGS+4We5ykMRcpIIz6pqcmcYdNT0V7dmbDh5P1J2r3Kh08QBAtGhalUCmPMgjBMJpPEYlffV1RErkwjPqmpds/i4V6X/ccD0hEc9fkOPNzrViX0AFzXpbW1ldbW1rn35XK5uZHhxMQEqVQK27aXjAxVqxBZHo34pOZSgWHHgWje6+v2LV7e65N0ozPVaIyZC8PSm2oVIsun4JNIODReYO9AJlKjPt+BZ+5O8PaO6IfH4lpFKpVSrULkChR8EhkffSHLExGZ8vQdeLDH5YtvjV/7gyMqDEMymcyCMMxms0tWkiYSCS2ekYai4JPIyBYMe76X4fBkWNPFLgkb3txu8917EsSvdVZRnblSraLUK1StQhqBgk8iZSZv2P1smsFpU5PwS9jQ32IxcK9Pc6wxnvhLtYr5b4VC4bIrSRWGshoo+CRyZvKGdw1kODIVVnXa03dgZ5vN07sTDRN6V3K5WgUwd59QtQqpZwo+iaRswfCxwzm+UqV7fqV7ep/b6a266c1yMMaQz+cXrCJVrULqlYJPIu3QeIEHns8ylTcVCUDfgbaYxZN3xtnVGf3Vm1GiWoXUKwWfRF4qMHzypzm+NBRgW5Rle7OkA6GBR/pcPnObF6meXj0zxsytJJ2/UXc8Hl+yklS1CqkVBZ/Ujcmc4fHhPI+9EjAdGGYDVnSkkUXxlIUW1+LRHS4P9caqtiNLI1OtQqJGwSd1JzSGg2Mhz4wGDIyHvDQdYgOuDZhiGFoUfwlCCIE3tNrc1WGzt9vl3nX2DW04LTcuDMMl9wtLtYr5U6SqVURTaAzHZgyD0yGZAuRC8GxIONDfYrO12Yr0Y0zBJ3UvNIahGcPgTPFBmC1A3Hn9QdhcfBDqyTP6giBY0jEMw3BBGKpWURulF5sHRgMOjYe8dDHEsYovNo0pvri0Aev1F5sFA7e02uzqtLmvy2XP+mi92FTwiUhk5fP5JWEIqlVUy2TOsH8oz+cHb/z2wr5+l4/0ReP2goJPROrG4lpFaapUtYrySgWGTxzJ8eXh8i8oe7jX5bO313ZBmYJPROralWoVsVhswchQtYrleW68wIdWeYVIwSciq878WsX80ypUq7iybMHw6OFc1TaK9x34cI/LYzXYNELBJyIN4Uq1isUbdDdiraLRtglU8IlIwyoUCguK9perVSSTSTzPW7Vh2Igbwyv4RETmaaRaRaMeBabgExG5hstt0A0sObqp3laSNurhzwo+EZEVulytIpVK4TjOkjCM6krS58YL3DeQiUTolfgOPL07UfHVngo+EZEyMMaQzWaXbNBdqlXM7xjWeiVpKjDsOJBmNBO9p/+uhMUr9/kV7fkp+EREKiSqtYrfeyHL4xGZ4lzMd4ol9y+8pXJTngo+EZEqulytIpfLzZ1WUVpEU6laxWTOsOmpVE0Xs1xLwoaT9ycrtr1Zfd2JFRGpc/O3VyuZX6uYnp5mbGyMIAgqUqvYP5THjvhiVNuCx4fz7NvhVeTra8QnIhJBV6pVzL9XuNJaRWgMW74VzXt7i3UlLEbe61fkVAeN+EREIsh1XVpaWmhpaZl73/yVpOfOnePkyZNYlrXsWsXBsZDpIPqhBzAdGJ49G/KO9eVf4angExGpE7FYjDVr1rBmzRpgaa3i7NmzpFIpXNddMk3qOA4HRgNmgxu/jrs7bb7y38Tp+X/TN/7FrmA2gKdPBwo+ERG5xLIsPM/D8zza2tqAhbWKVCrFmTNn5moVz77WhaE+zi40wHMTlVmBo+ATEVlFLMsikUiQSCRYu3YtUAzDVDrNqy/WZimnbRXP4lupX1wMMcaUfXWrgk9EZJWzLIvThQSulSa76M8+sMnhP/7ypc5czIYfnQvZO5Dhj2/1+O9udog7Fn/1WsDHDufIzOv+feKWGL/fH2MmMHzqZzmePFH8wy+9zSNdgC1NFrs7Hd5/KEPcsfg/3xRja7PNhbxh/3DAH72Yv+p128CxGcO2FgWfiIis0OB0iGsDi0rrf3mywF+eLO492uLCD9/h87UTAZ+5zaOvyeKt30mTD+HP7ojzqTfE+MOfFcNqQ8Kiw7PY/K0Ud9xk8627ErwwWTzlAeCBzS7vfS7D/eeyeDbccZPNR36c5cULhjetsXjmbp/DkyFPnb5yi961YXAmZFtLecv9OoFRRKQBZApwtfKaRTHcvn+2wH8cCvidPpePHc4xmYOZAD77Up4Pblo4VvonP8+RC2FgPOTAaIEPzPvzp04H/PBciAGyIXx/POTnFwwG+NkFw9dOBNy97hoLVwwLRpjlohGfiEgDyIVwtTt8f3xrjBbX4vd/kqUzDk2uxY9/3Z/7cwuYf2LQZA5S80LpxGxIV+LSB5xKLUzZX1lr83/d5vHGVhvPhrgDXz959VQzQFbBJyIi18OzrzzF98FNDv9os8sd300TGJjIFjeyvu1v0pxOX36Y2O5B0rkUfpuSNi9evBStiz/rP90R59++mufdAwHZED6306MjfvV7dxbFgCw3TXWKiDSAhAOXWxy5s83mC2+O81s/yDLx+soXA/zpUMDndnp0vr7updu3eOeiTt2n3+QRs2FXh827ux2+fvLKJcEW1+J8tjjt+ba1Ng9sXsa4yyped7lpxCci0gD6W2yCy8x13r/Rod2D7/9aYu59hyYK/NYPsnzqjTF+sMenI27xWtrw74/l+fZY8WPOZAyTOcPJ9yZJBYbffSHHK9NXvon4P/9Dlv/ndo8/eYvHwHiBvzwZ0HaNTaiDEPqbyz8+016dIiINIDSG1m9E+1SGxRIOTL8/WfYen6Y6RUQagG1Z3NJaX0/5b2i1K3I0U339LYiIyHXb1WkT8ROJ5ljAXR2ViSgFn4hIg7ivy6WpTlZ2NLmwt7syF6vgExFpEHvW27S49THma3Ut7l2nEZ+IiNwA27LY1++SrEBFoJySDuzb4VbkEFpQ8ImINJSP9MWu66SEagoNPNRbueOTFHwiIg2k3bN4uNfFj+ioz3fgkT6X9mt0/G6EenwiIg0mFRh2HEgzmone03+3b/HyXp9kBe9FasQnItJgkq7Fk3fGIzfq8x148s54RUMPFHwiIg1pV6fDh3uiM+XpO/Bgj8vbOyp/QZrqFBFpUNmCYc/3MhyeDGu6lVnChje323z3ngRxp/J1C434REQaVNyxeGZ3gv4Wi0SN0iBhQ3+LxdO7qxN6oOATEWlozTGLgXt9drbbVZ/29J3iSG/gXp/mWPWK9Qo+EZEG1xyzOHhPggereM+vdE/vu/ckqhp6oHt8IiIyz6HxAg88n2Uqb0gXyv/1fQfaYsVVpbs6a7OyRsEnIiILpALDJ3+a40tDAbYFqTIEYNIp7sjySJ/LZ27zKl5ZuBoFn4iIXNZkzvD4cJ7HXgmYDgyzAawkMCyKpyy0uBaP7nB5qDdW0R1Zln1dCj4REbma0BgOjoU8MxowMB7y0nSIDbg2YCAoFHAdBywIQggpHiJ7V4fN3m6Xe9fZFdtw+noo+EREZEVCYxiaMQzOhGQKMDg0Qn/fFhIO9DfbbG22KnJyernUyZGEIiISFbZlsa3FYltLsRhweGKanTfXT5yoziAiIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg1FwSciIg3FrfUFLEdoDMdmDIPTIZkC5ELwbEg40N9is7XZwrasWl+miIjUgUgGX2gMB8dCDowGHBoPeeliiGOBa4MxEFIcqloWBCEUDNzSarOr0+a+Lpc9620FoYiIXFakgm8yZ9g/lOfzgwHTgWE2ADP/AwpX/tyfTIUcngrZPxzQ4lrs63f5SF+Mdk8BKCIil1jGGHPtD6usVGD4xJEcXx4OsC1IXSXglivpQGjg4V6Xz97ukXQVgCIilXD48GF27txZ68tYtpovbnluvMCOA2n2Hw/IhOUJPSh+nUwI+48H7DiQ5tB4mb6wiIjUtZoFX7Zg+OgLWe4byDCaMaQrlEvpAoxmDHsHMnz0hSzZQs0HuCIiUkM1Cb6ZvGHP9zI8cTyoWOAtli7AE8cD3vG9DDN5hZ+ISKOqevDN5A27n01zeDKsWuiVpAvwk8mQ3c+mFX4iIg2qqsGXLRjeNZBhcNqQCav5nS/JhDA4XZz61LSniEjjqWrwPXo4x5GpsGahV5IJ4fBUyMcO52p7ISIiUnVVC77nxgtVvad3LekCfOV4oNWeIiINpirBlwoMH3o+G5nQK0kX4IHns6QCTXmKiDSKqgTfx4/kmIroYpKpvOGTP9WUp4hIo6h48E3mDPuHozPFuVi6AF8aCpjMRTOYRUSkvCoefPuH8tgR3y3MtuDx4XytL0NERKqgosEXGsPnB4OybUNWKakCPPZKQFj7bUtFRKTCKhp8B8dCputk4ch0YHj2bI17FiIiUnEVDb4DowGzwfV//pfe5vFHb4qV74KuYjaAp0/fwMWKiEhdqGjwHRoPqY/xXvHcv+cmNOITEVntKhZ8oTG8dLG+guQXF0MicDyhiIhUUMVOYD82Y3BWuJpzZ5vNf3ibx/Zmm6dHC6+PFou/PtLn8r/tiLHWs/jBRIHffSHHaKb4Z7++3uFfvdljQ8LiqycC3thq82cjAV8eXtnUpf36dW9rifgyVBERuW4VG/ENToe4K/jqMRu+8fY4fz4S0PlXKb5+KuD9NzsA/No6m39+q8cDz2e5+VspRlKGr94ZB+AmD/7iV+P84c9yrPuvKQanQ+7suL7/W64NgzP1NUoVEZGVqVjwZQqwklnDO9baxGz4wmBAYOC/nCrw/50vhtADm10eH87zk6mQXAh/+LMcd9xksyVpsbfL5RcXQv7qtQIFA//6aMCZzHVOV5ridYuIyOpVseDLhbCSsVOXb/FaemFgjaSKv+/2rbn/DcUVmOeyho2+RbdvcXLR572Wur7gM0BWwScisqpVLPg8e2Vf/EymGGTzbU4Wf386bdiSvPRnSQduiheDcjRtuHnR521MXt89OguIO9f1qSIiUicqFnwJB6wV5M/z50KCEH5vu4trwW9sdHjb2uLl/cWJgAd7YtzeZuPZ8Me3evz4XMhIynBgNOBNa2zu73ZwLPjdbS4bEte5OMUqXreIiKxeFQu+/habYAVznfkQPvDDLB/ucRn/jSQf3OTyzVPFeceDZ0P+6Ys5/vOvxjn13iRbmy0+9KMsAOdy8I+fz/LZ2z3Ovi/JLa02L5wPyV7HGpUghP7mqp7NKyIiVWaZChXXQmNo/Uaq6qetW8DIe30+/KMs3xtf2TePWyEv3HqaZNInmUySTCbxfR9rJUNXEZEGc/jwYXbu3Fnry1i2ivX4bMvillabn0xVPvneud7h784XSBfgD3bEsIAfnV/5933jGodNm24mlUoxOzvLxMQEuVyORCIxF4TJZJJ4PK4wFBGpUxULPoBdnTaHpyq/bdkdHTb/6Y44ng0vXQz5rR9kV1xLsIC7Oh2amuI0NTXNvb9QKJBOp0mlUly8eJEzZ84QBAG+7y8IQ8/zFIYiInWgYlOdAN85U+ADP8wwUwd7Pze78PW3J3jH+muvbgmCgFQqNfeWTqcJw3BBECaTSWKx6mywLSJSS5rqnGfPepsW12KmDo4manUt7l23vIUtruvS2tpKa2vr3Pvy+fxcEE5MTJBKpbBte8nI0HUr+lcuIiLXUNFnYduy2Nfv8ukX85E+jDbpwL4dLvYNTFXGYjHWrFnDmjVrADDGkMvl5sLw7NmzpFIpXNddEIS+7+M46lCIiFRLxYcfH+mL8U9+nq/0t7khoYGHess7LWlZFvF4nHg8Tnt7O1AMw2w2OxeGU1NTZDIZYrHYkjC0bdUqREQqoeLB1+5ZPNzrsv94QDqCoz7fgYd7Xdq9yi9MsSyLRCJBIpFg7dq1QDEM0+n03AKa8+fPk8lkFqwk9X1ftQoRkTKp6OKWklRg2HEgPXeMUJR0+xYv7/VJutEJlTAM54Kw9JbP51WrEJFI0uKWy0i6Fk/eGWfvQCZSoz7fgSfvjEcq9ABs26apqWlJraK0gnR+rWLxFKlqFSIiV1e1JYa7Oh0+3OPyRESmPBO24cGeGG/vqI+FJY7j0NLSQktLy9z75tcqzp8/TyqVwhijWoWIyFVUdW39Yzs9jkyFHJ4Mq76V2XxxG3bEs3x8YwpjOut2hHS5WkUul5ubJp1fq5g/KlStQkQaWVWf/eKOxTO7E+x+Ns3gtKlJ+CVs6G+x+M5dzZw9McxrQY6NGzfWbfgt5nkenuddsVYxNjZGOp1WrUJEGlZVFrcsNpM3vGsgw5GpsKrTnr4DO9tsnt6doDlmUSgUGB4exrZttmzZ0jBP/ItrFalUikwmg+d5Cwr3qlWIyHLU2+KWmgQfQLZg+NjhHF+p0j0/34EHe1w+t9Mj7lwa3YVhyMmTJ8lms/T29jbs/bAwDMlkMgvCMJvNLllJmkgkVs3oWETKQ8G3QofGCzzwfJapvKlIAPoOtMWKq0p3dV5+RGeMYWxsjPPnz9PX10cikSj/hdShK9UqSr1C1SpEBBR81yUVGD750xxfGgqwLcqyvVnSKe7I8kify2du85ZVWTh37hyjo6P09PTQ3Nx84xexCpVqFfM36F5cqyitJFUYijQGBd8NmMwZHh/O89grAdOBYTZgRUcaWUCTCy2uxaM7XB7qja14R5bp6WlGRkbYuHHj3FZjcnWLT6tIpVIASzbobtRpZJHVTsFXBqExHBwLeWY0YGA85KXpEBtwbcAUw9Ci+EsQQgi8odXmrg6bvd0u966zb2jD6XQ6zdDQEB0dHaxbt04jlxUyxsydVjF/qnR+raK0eEa1CpH6p+CrgNAYhmYMgzMhmQJkCxB3IOFAf7PN1mar7OGUy+UYGhqiqamJm2++WeF3gxbXKkqhqFqFSP1T8K0ihUKB48ePY1lWQ9UdqsUYM7eStDQyLNUqFq8kVa1CJLrqLfg0z3QVjuPQ19fHyZMnefXVV+nr69N9qjKyLGtuhWjJ4lrFxMSEahUiUlYKvmuwLItNmzYxNjbG4OAgfX19C56opbzm3wcsmV+rmJmZ4ezZs3O1ivlTpKpViMhyKPiWwbIsNmzYgOd5HDt2jC1btizYLFoq62qnVZQO9D19+jRhGF52JanCUETmU/CtwNq1a4nFYoyMjNDd3T13mKxU3+VOq8jn8wsO9D116hTAgs25VasQEQXfCrW0tLB161aGhobI5XKsX79eI4qIiMVixGKxudMq5tcqUqkU4+PjpNNp1SpEGpwe7dfB9336+/vnwm/Tpk0KvwiyLGvutIq2tjZgaa3izJkzpNNpYrHYgpGhahUiq5fqDDegUCgwMjKCMYaenh49Udap+bWKUrVCtQqR5au3OoOC7wYZYzh16hSzs7P09fXheV6tL0nK4HKnVeRyORKJxIL7hapViNRf8Gmq8wZZlsXNN9/M2bNnOXr0qOoOq8TlahWFQuGatYpkMonneQpDkQhT8JWBZVmsX79edYdVznEcmpubF5zcEQTBXBiqViFSHxR8ZdTe3k4sFuP48eN0dXVx00031fqSpMJc171sraI0Pbq4VjH/TStJRWpDj7wya25uZtu2bXMrPjds2KBX+g0mFouxZs0a1qxZAyytVZw9e5ZUKoXjOEvCUAukRCpPwVcBiUSC7du3Mzw8PFd30GrAxnWlWkU2m52bJl1cq5jfMdTPjkh5aVVnBZXqDmEY0tPTo6ktuarFtYrSaRXxeFy1Com0elvVqeCrMGMMr732GjMzM6o7yIpdrVYxv3CvWoXUUr0Fn4YgFWZZFhs3bmR8fJyjR4/S29u7YIm8yNVcq1YxPT3N2NgYQRCoViGyTAq+KrAsi3Xr1uF5HkNDQ2zevHluP0mRlVpJrWL+vULVKkSKFHxV1NbWRiwWY3h4mA0bNtDR0VHrS5JV4lq1inPnznHy5Eksy1KtQhqefuKrrKmpie3bt3Ps2DFyuRxdXV16BS4Vsdxaheu6S45uUq1CVjMtbqmRIAgYGhrC8zw2b96sVXpSE6VaxfwNulWrkJWqt8UtCr4aCsOQkZERgiCgt7dXU04SCcutVfi+r9kKARR8skLGGE6fPs3Fixfp6+sjHo/X+pJElgjDcG7xTOm/82sVpbd4PK4wbED1FnwaYtRYqe7ged7c6Q6qO0jU2LZNU1MTTU1Nc++bX6u4ePEiZ86cUa1C6oKCLyI6OzvnTnfYvHnz3IIEkai6nlrF/NMqRGpFwRcha9asoa+vj+HhYfL5vOoOUneuVauYmJgglUph2/aSkaHucUu16B5fBGWzWYaGhmhtbaW7u1vTRLKqGGPI5XJzI8PSW6lWMb9aoVpFNIXGcGzGMDgdkinA0eERtvduIeFAf4vN1mYLO8LPWwq+iAqCgOHhYWKxmOoOsuotrlWUVpKqVhENoTEcHAs5MBpwaDzkpYshjgWuDcZAUCjgOg6WBUEIBQO3tNrs6rS5r8tlz3o7UkGo4IuwMAw5ceIE+XxedU6ChxQAABKTSURBVAdpOFeqVcxfSer7vmoVFTSZM+wfyvP5wYDpwDAbwEoCwwKaXGhxLfb1u3ykL0a7V/t/KwVfxBljGB0d5cKFC6o7SMObX6soVStUqyi/VGD4xJEcXx4OsC1IFW78ayYdCA083Ovy2ds9km7t/n0UfHViYmKCM2fO0Nvbu2BJuUijm1+rKL0FQbDkfqFqFcvz3HiBDz2fZSpvSJch8BbzHWiLWTx5Z5xdnbW5h6vgqyMXLlzg5MmT3HzzzXMneYvIUkEQLAjCdDqtWsU1ZAuGRw/neOJ4UJHAW8x34MM9Lo/t9Ig71X1BouCrM6lUiuHhYdatW0dnZ2etL0ekbsyvVZTeVKsomskb3jWQ4chUWJXQK/Ed2Nlm8/TuBM2x6oWfgq8O5XI5jh07prqDyA0o1SrmjwoX1ypKU6WruVYxkzfsfjbN4LQhE1b/+yds6G+xGLjXr1r4KfjqVBAEHD9+HMdx2LJli5Z4i5TBlWoVnuctGBmullpFtmDY870MhyfDmoReScKGN7fbfPeeRFWmPRV8dSwMQ06ePEk2m6Wvr68hp2hEKs0YM3dc05VqFclkkkQiUXezLx99IVu1e3rX4jvwYI/LF99a+ZXrCr46Z4zhzJkzTE5O0tfXRyKRqPUliax6i2sVqVSKfD5fV7WK58YL3DeQiUTolfgOPL07UfHVngq+VeLcuXOMjo7S09OzYNNgEamOQqGw4F7h4lpFaYo0CrWKVGDYcSDNaCZ6T/9dCYtX7vMr2vNT8K0iFy9e5MSJE6o7iETE4lpFKpUCWLKStNq1it97IcvjEZniXMx3iiX3L7ylclOeCr5VplR36OzspLOzs+avLEVkoctt0G3b9pKRYaXu2U/mDJueStV0Mcu1JGw4eX+yYtubaTXEKpNMJtm+fTtDQ0Pkcjk2btyo8BOJEM/z8Dxv7szNxbWKM2fOkE6nK1ar2D+Ux474U4JtwePDefbt8Cry9TXiW6UKhQLDw8PYts2WLVtWdQ9JZLW5Wq1i8UrSldQqQmPY8q1o3ttbrCthMfJevyKnOij4VrFS3SGTydDX16ftmUTqWBiGS06ryOVyxOPxZdcqvnOmwAd+mGEmqPLFX4dmF77+9gTvWF/+F+2a6lzFbNtm8+bNjI2NcfToUdUdROrY/PuAJfNrFTMzM5w9e5Z8Pj93XNPiWsWB0YDZKoXewXsS/PlIwJeHi9/wj94U43/YGiMIDTd/K33Nz58N4OnTgYJPVs6yLDZs2IDnebz66quqO4isIrZt09TUtODEllKtIpVKceHCBc6cOUOhUMD3ff72dCemBk/7m5IW+/pj9P11ivHs8j7HAM9NVGYFjoKvQaxdu5ZYLMbx48fZuHEj7e3ttb4kEakAx3FoaWmhpaVl7n1BEDAzm+Lo4dpss7Y5aXEuZ5YdeiW/uBhijCn7Ar3632xOlq2lpYWtW7dy+vRpxsbG0O1dkcbgui7jdjPuMgPkwR6Xv9p1qUf30l6fr9156ffD7/G5vc3mzptsnn9HgnO/keT5dyS486alkbJnnc0zuxN0+xZTv5nkS29b/kpNGzg2U/7nKQVfg/F9n+3btzM5OcmpU6cUfiINYnA6xF3mM/7AeIFdHQ4WxdWVng13vB5qvU0Wza7FiVTIU3cl+OLRgHX/NcW/eiXPU3clWLso1w6eDXnPcxlOpw1t30zxyN/nln3Nrg2DM+Wf7lTwNSDP89i+fTu5XI6hoSEKhQhu3yAiZZUpwHJf5w7PGqYDw842m7s6bb59psBo2rCjxWJ3p8Oh8QL3dbm8Oh3y5yMBBQN/cbLAKxdD3tNdxjtopnjd5abga1CO48xVHF599VXy+XytL0lEKigXwkrGTgPjIXevs7mr02FgvMD3xwvs7nTY3WkzMF6g27cYSS1M0pGUYaNfvvtxBsgq+KScLMti06ZNtLW1MTg4SDp97SXGIlKfPHtlT/gD4wXu7nTY1eEwMB4yMB6+HnzF359OG7YkF4bc5qTFa+ny3T6xgHgF9t5Q8DU4y7JYv349XV1dHDt2jOnp6VpfkohUQMKBlSyOHDhb4J51Dr4Dr6UNz40X+G83ONwUt/jJVMjTowHbW2z+8WYHx4IPbHK4pdXmr0+XsShoFa+73FRnEOBS3WFkZITu7m7Wrl1b60sSkTLqb7EJVjDXeXTGMBMYDk0U5xqnAxiaDZnIGkID53PwvkMZHtvp8W/eEufVmZD3HcpwbvlrV64pCKG/ufzjM21ZJgtkMhmGhoZYu3Yt69ev1wbXInXMGDO3zdnMbIo3/F07WVM/j+mEA9PvT5b9eUgjPlkgkUgsON1h06ZNCj+ROrD4lIfSobixWGxu67JfaoEjF2t9pcv3hla7Is8/Cj5ZIhaLsW3bNkZGRhgaGqKnp0enO4hEiDGGfD6/JOTm7+e5YcOGJef67V6f5acXA+phms8C7uqozDIUTXXKFRljOHXqFLOzs/T19eF5lTkbS0Su7konuS8+r+9aJ7DodIYiBZ9clTGGs2fPMjExQV9fH77v1/qSRFa1+ZtMl97CMFxw2kIymSQWi614GrCezuPrTlgcr9B5fJrqlKsq1R08z+PYsWNs3ryZ1tbWWl+WyKoQhuHcNGUp5ErHCiWTSdra2uju7sbzvLLc67Iti339Lp9+MU8qwhs2JR3Yt8OtSOiBRnyyAjMzMxw/fpyuri5uuummWl+OSF0px0Gy5TCZM2x6KkWmMif+lEXChpP3J2n3KvP3oBGfLFtzczPbtm2bW/G5YcMGrfgUuYz5NYLSaC6TyeB53lzAdXR0kEgksO3q7iPS7lk83Ouy/3hAOoKjPt+Bh3vdioUeaMQn1yGfzzM8PEw8HmfTpk1Vf+CKRMm1agSlaUvf9yOzOjoVGHYciOa9vm7f4uW9PklXwScRE4Yhx48fJwxDenp6FiyZFlmt5tcI5t+Xm18jKIVc1B8Th8YL7B3IRGrU5zvwzN0J3t5R2RcICj65bsYYXnvtNWZmZlR3kFXpajWC+assr1UjiKqPvpDliYhMefpO8QDcL741fu0PvkEKPrlhZ8+eZXx8nN7eXpLJZK0vR+S6LK4RpNNpgiBYMJK73hpBVGULhj3fy3B4MqzpYpeEDW9ut/nuPQniTuX/bhV8UhZTU1OcPHmSLVu2qO4gkReG4YKpysU1gtJ/4/H4qgm5K5nJG3Y/m2Zw2tQk/BI29LdYDNzr0xyrzt+1gk/KZnZ2luHhYTZs2EBHR0etL0cEKE7JLw65WtQIomwmb3jXQIYjU2FVpz19B3a22Ty9O1G10AMFn5RZNpvl2LFjtLW10dXV1bBPJFIbxhiy2eyCkFtcIyiFnFYjL5QtGD52OMdXqnTPr3RP73M7vapMb86n4JOyC4KAoaEhPM9j8+bNeoKRirhSjcB13SUrLKNSI6gHh8YLPPB8lqm8qUgA+g60xSyevDPOrs7a/Lso+KQiwjBkZGSEIAjo7e2N/NJuib5cLrdkyrIeawT1IBUYPvnTHF8aCrAtyrK9WdKB0MAjfS6fuc2raE/vWhR8UjHGGE6fPs3Fixfp6+sjHq/8MmVZHS5XIzDGXHaFpVTOZM7w+HCex14JmA4MswErOtLIAppcaHEtHt3h8lBvrKI7siz7uhR8Umnj4+OMjY3R19enuoMsUaoRzB/NrfYaQb0JjeHgWMgzowED4yEvTYfYgGsDphiGFsVfghBCiofI3tVhs7fb5d51dsU2nL4eCj6pigsXLnDixAk2b97MmjVryvq1Q2M4NmMYnA7JFCAXgmdDwoH+FputzVakHnSN7Go1gvmF8EaoEdSz0BiGZgyDM8XHXLYAcef1x1xz8TEX5X8/BZ9UTanusH79ejo7O6/765RefR4YDTg0HvLSxRDHKr76NKb4atMGrNdffRYM3NJqs6vT5r4ulz3ro/Xqc7Uq1QjmB10mkyGRSKhGIDWl4JOqymazDA0N0draSnd394qe8CZzhv1DeT4/eOP3G/b1u3ykLxr3G1aDq9UI5o/kfN/XKl+pOQWfVF0QBAwPDxOLxZZVd0gFhk8cyfHl4fKvMHu41+Wzt9d2hVm9UY1A6p2CT2oiDENOnDhBPp+/at3hufECH1rlnaKoK51GcLkawfzRnGoEUi8UfFIzxhhGR0e5cOHCkrpDtmB49HCuajvH+w58uMflsRrsIhEli2sE6XSaMAxVI5BVRcEnNTcxMcGZM2fo7e2lqamp4fYNrJVCobBkheXiGoHv+3iep8Unsqoo+CQSSnWH9q5N3P+C11A7xVfD4hpBOp0ml8stWWGpGoE0AgWfRMbk9Cx7vpfhlUycrKndk2+1zwYrt+XUCEq9OYWcNCIFn0TGR1/I8sRwQLqGB2KWVPM06BtxpRpBLBZbMmWpGoFIkYJPIuG58QL3DWSqek/vWnwHnt6diMxqz1KNYPF9OdUIRFZGwSc1lwoMOw6kGc1E70exK2Hxyn1+TXp+V6oRzK8QqEYgsnJ6xEjNffxIjql89EIPYCpfPJ7lC2+p7JRnqUYwfzQ3v0bQ0dGhGoFImWjEJzU1mTNseipVkxWcy5Ww4eT9ybJtb3alGsHikZxqBCKVoRGf1NT+oTx2xJ/bbQseH86zb4e34s+9Vo2gtbWVDRs2qEYgUkUa8UnNhMaw5VvRvLe3WFfCYuS9/lVPdTDGkMlklqywTCQSSzZqVsiJ1I5GfFIzB8dCpoPohx7AdGB49mzIO9YXV0surhGUenPzawRr165VjUAkghR8UjMHRgNmg5V/Xn+LxVfviLO12eZTP8/xxaPX8UVWaDaAbw7P8obCzGVrBGvWrCGZTKpGIFIHFHxSM4fGwxWdp1fyBztifH885Je/kyn7NV2JAQ5NhNg9NuvWrVONQKSOaQ5GaiI0hpcuXt9Szi1NNi9eqP4y0GPZGOvXr6e1tVWhJ1LHFHxSE8dmDNezDeZ37k5wT6fNn7zFY+o3k9y2xmb/r3iM3p/k2Lt9/o9bYlhAuwfH3+Pznq7i1GOTCy/v9fntLdcfWPbr1y0i9U3BJzUxOB3iXsdP369/P8OhiZD/5R9ytH0zxb4dLmtiFtsPpLj3bzP8do/LQ70ukzn47/8+x7/7ZY/OOHxup8eRqZA/G7n++4GuDYMzES4cisiyaL5GaiJTgBst0tgW/KNNLm/9TpqZAGYCw+dfyfPbW1z2Dwd8Z6zA108V+PbdPms9ePO30zf2DU3xukWkvmnEJzWRC+FGx04dHniOxcjspQQ9kTJ0+5fmUP90KM+tbTZPHA84n7ux72eArIJPpO4p+KQmPPvGf/gmcpArGLY0XQq6TUmL0+liENoW/Lu3xnnieJ7/cVuMrc03Vhq3gLjaCiJ1T8EnNZFw4EY3LwkN/OWpAv/sTR7NLmxOWvx+f4w/f/0+3idviWGA3/n7HJ97Jc/jvxK/se3RrOJ1i0h9U/BJTfS32ARlWCfyv/5DltkCHL0vyffvTfC1EwH7hwPe0m7z+/0xHvpxltDAv3g5jwE+/kvXf7pBEEJ/sx4yIvVOe3VKTYTG0PqNaJ/KsFjCgen3J7XPpkid08tXqQnbsriltb5+/N7Qaiv0RFaB+nrmkVVlV6dNvcSIBdzVoYeLyGqgR7LUzH1dLk110iRtcmFvd51crIhclYJPambPepsWtz7GfK2uxb3r9HARWQ30SJaasS2Lff0uyYhXBJIO7NvhXvUQWhGpHwo+qamP9MUII76uODTwUO/11yBEJFoUfFJT7Z7Fw70ufkRHfb4Dj/S5tHsa7YmsFurxSc2lAsOOA2lGM9H7Uez2LV7e65Osk3uRInJtGvFJzSVdiyfvjEdu1Oc78OSdcYWeyCqj4JNI2NXp8OGe6Ex5+g482OPy9o6IXJCIlI2mOiUysgXDnu9lODwZ1nQrs4QNb263+e49CeLXc0y8iESaRnwSGXHH4pndCfpbLBI1+slM2NDfYvH0boWeyGql4JNIaY5ZDNzrs7Pdrvq0p+8UR3oD9/o0xxR6IquVgk8ipzlmcfCeBA9W8Z5f6Z7ed+9JKPREVjnd45NIOzRe4IHns0zlDelC+b++70BbrLiqdFenFrKINAIFn0ReKjB88qc5vjQUYFuQKkMAJp3ijiyP9Ll85jZPlQWRBqLgk7oxmTM8PpznsVcCpgPDbAAr+eG1KJ6y0OJaPLrD5aHemHZkEWlACj6pO6ExHBwLeWY0YGA85KXpEBtwbcAUw9Ci+EsQQkjxENm7Omz2drvcu87WhtMiDUzBJ3UvNIahGcPgTEimANkCxB1IONDfbLO12dLJ6SIyR8EnIiINRXUGERFpKAo+ERFpKAo+ERFpKAo+ERFpKAo+ERFpKAo+ERFpKAo+ERFpKP8/Qe6gh9k59YsAAAAASUVORK5CYII=\n"
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The visualization shows the layout of the graph. A centrality and path function were also run. Centrality shows the most central or related nodes. In this case, the `wolf` node has the highest score. We also ran a path function to show how the graph is traversed from `dog` to `horse`."
      ],
      "metadata": {
        "id": "6m8tNKnCPJUT"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Build a Semantic Graph\n",
        "\n",
        "While txtai graphs can be standalone, with nodes and relationships manually added, the real power comes in indexing an embeddings instance.\n",
        "\n",
        "The following section builds an embeddings index over the `ag_news` dataset. `ag_news` contains news headlines from the mid 2000s. This configuration sets the familiar vector model and content settings.\n",
        "\n",
        "Column expressions is a feature starting with txtai 5.0. Column expressions alias expressions allowing SQL statements to use those references as a shorthand for the expression.\n",
        "\n",
        "Next comes the graph. The configuration sets the maximum number of connections to add per node (15) along with a minimum similarity score (0.1). Topic modeling parameters are also added which we'll cover later."
      ],
      "metadata": {
        "id": "96_vksSH94cQ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "from txtai.embeddings import Embeddings\n",
        "\n",
        "# Create embeddings instance with a semantic graph\n",
        "embeddings = Embeddings({\n",
        "  \"path\": \"sentence-transformers/all-MiniLM-L6-v2\",\n",
        "  \"content\": True,\n",
        "  \"functions\": [\n",
        "    {\"name\": \"graph\", \"function\": \"graph.attribute\"},\n",
        "  ],\n",
        "  \"expressions\": [\n",
        "      {\"name\": \"category\", \"expression\": \"graph(indexid, 'category')\"},\n",
        "      {\"name\": \"topic\", \"expression\": \"graph(indexid, 'topic')\"},\n",
        "      {\"name\": \"topicrank\", \"expression\": \"graph(indexid, 'topicrank')\"}\n",
        "  ],\n",
        "  \"graph\": {\n",
        "      \"limit\": 15,\n",
        "      \"minscore\": 0.1,\n",
        "      \"topics\": {\n",
        "          \"categories\": [\"Society & Culture\", \"Science & Mathematics\", \"Health\", \"Education & Reference\", \"Computers & Internet\", \"Sports\",\n",
        "                         \"Business & Finance\", \"Entertainment & Music\", \"Family & Relationships\", \"Politics & Government\"]\n",
        "      }\n",
        "  }\n",
        "})\n",
        "\n",
        "# Load dataset\n",
        "dataset = load_dataset(\"ag_news\", split=\"train\")\n",
        "rows = dataset[\"text\"]"
      ],
      "metadata": {
        "id": "xT2TcpWDrejZ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Index dataset\n",
        "embeddings.index((x, text, None) for x, text in enumerate(rows))"
      ],
      "metadata": {
        "id": "SoA2HSIAr7Fb"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "The embeddings index is now created. Let's explore!"
      ],
      "metadata": {
        "id": "Uyup__t3RYbE"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Topic modeling\n",
        "\n",
        "[Topic modeling](https://en.wikipedia.org/wiki/Topic_model) is an unsupervised method to identify abstract topics within a dataset. The most common way to do topic modeling is to use clustering algorithms to group nodes with the closest proximity.\n",
        "\n",
        "A number of excellent topic modeling libraries exist in Python today. [BERTopic](https://github.com/MaartenGr/BERTopic) and [Top2Vec](https://github.com/ddangelov/Top2Vec) are two of the most popular. Both use [sentence-transformers](https://github.com/UKPLab/sentence-transformers) to encode data into vectors, [UMAP](https://github.com/lmcinnes/umap) for dimensionality reduction and [HDBSCAN](https://github.com/scikit-learn-contrib/hdbscan) to cluster nodes.\n",
        "\n",
        "Given that an embeddings index has already encoded and indexed data, we'll take a different approach. txtai builds a graph running a query for each node against the index. In addition to topic modeling, this also opens up much more functionality which will be covered later.\n",
        "\n",
        "Topic modeling in txtai is done using [community detection](https://en.wikipedia.org/wiki/Community_structure) algorithms. Similar nodes are group together. There are settings to control how much granularity is used to group nodes. In other words, topics can be very specific or broad, depending on these settings. Topics are labeled by building a BM25 index over each topic and finding the most common terms associated with the topic.\n",
        "\n",
        "Let's take a closer look at the topics created with this embeddings index."
      ],
      "metadata": {
        "id": "jI9-KIrD-yKa"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Store reference to graph\n",
        "graph = embeddings.graph\n",
        "len(embeddings.graph.topics)"
      ],
      "metadata": {
        "id": "kps7fhU50xvZ",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "75ad8bd2-0fe7-40ab-9aa6-f34a6fb8fe21"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "1919"
            ]
          },
          "metadata": {},
          "execution_count": 25
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "list(graph.topics.keys())[:5]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "c-KX3xuM1EIP",
        "outputId": "277376f3-1965-4816-9cc5-5c4aa9cd85d2"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['kerry_john_bush_president',\n",
              " 'sox_red_boston_series',\n",
              " 'oil_opec_prices_said',\n",
              " 'dollar_reuters_against_euro',\n",
              " 'darfur_sudan_region_said']"
            ]
          },
          "metadata": {},
          "execution_count": 26
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The section above shows the number of topics in the index and top 5 topics. Keep in mind that `ag_news` is from the mid 2000s and that is evident with the top topics.\n",
        "\n",
        "Given that we added functions to run SQL functions to get the topic for each row, we can use that to explore topics.\n",
        "\n",
        "Each topic is associated with a list of associated matching ids. Those ids are ranked based on the importance to the topic in a field named `topicrank`. The section below prints the best matching text for the topic `sox_red_boston_series`. "
      ],
      "metadata": {
        "id": "G9e_IkSYV9ap"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(embeddings.search(\"select text from txtai where topic = 'sox_red_boston_series' and topicrank = 0\", 1)[0][\"text\"])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Kk2mo4dA1M4M",
        "outputId": "2039403e-4469-4196-a3cd-afd581de9c8a"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Red Sox heading to the World Series The Boston Red Sox have won the American League Championship Series and are heading to the World Series for the first time since 1986.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "In addition to topics, higher level categories can be associated with topics. This enables having granular topics and encompassing categories for topics. For example, the topic of 'sox_red_boston_series' has a category of `Sports`. See below."
      ],
      "metadata": {
        "id": "pFz8L5Z8VS5X"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "for x, topic in enumerate(list(graph.topics.keys())[:5]):\n",
        "  print(graph.categories[x], topic)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "rveal0m82G4I",
        "outputId": "fa1df06a-cbd4-4b57-8483-0f999ddfd93c"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Politics & Government kerry_john_bush_president\n",
            "Sports sox_red_boston_series\n",
            "Business & Finance oil_opec_prices_said\n",
            "Business & Finance dollar_reuters_against_euro\n",
            "Politics & Government darfur_sudan_region_said\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Topics and categories can also be used to filter results. See the difference when just querying for results similar to `book` and similar to `book` with a topic of `Sports`."
      ],
      "metadata": {
        "id": "durtFtCGXFSl"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(embeddings.search(\"select text from txtai where similar('book')\", 1)[0][\"text\"])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NvFApEbn9LKM",
        "outputId": "c1409750-2f37-46fc-be5c-8f19cd166a6b"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "A Guidebook for Every Taste LANNING a trip involves many difficult decisions, but near the top of my list is standing in a bookstore trying to choose from a daunting lineup of guidebooks, a purchase that brands the owner \n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(embeddings.search(\"select text from txtai where category='Sports' and similar('book')\", 1)[0][\"text\"])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yH3dOcNHEn-n",
        "outputId": "06e3c011-a09c-4182-c44a-55a79c016f63"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Same story for Wildcats After a game about as artful as a dime-store novel, Virginia coach Pete Gillen turned to literature to express the trying time his No.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Graph analysis\n",
        "\n",
        "Indexing an embeddings instance into a graph adds the ability to do network analysis. For example, the centrality of the graph can be analyzed to find the most common nodes. Alternatively, pagerank could also be run to rank the importance of nodes within the dataset. \n",
        "\n",
        "The section below runs graph centrality and shows the associated topic for the most central nodes. Not surprisingly, many of the topics are top topics."
      ],
      "metadata": {
        "id": "miJN5A4o-93q"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "centrality = graph.centrality()\n",
        "\n",
        "topics = list(graph.topics.keys())\n",
        "\n",
        "for uid in list(centrality.keys())[:5]:\n",
        "  topic = graph.attribute(uid, \"topic\")\n",
        "  print(f\"{topic} ({topics.index(topic)})\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "PiprqX8g2elt",
        "outputId": "6b85a0d0-974a-4540-d51a-0934238bd472"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "peoplesoft_oracle_takeover_bid (442)\n",
            "darfur_sudan_region_said (4)\n",
            "windows_microsoft_xp_service (12)\n",
            "fallujah_us_city_iraqi (24)\n",
            "eclipse_lunar_moon_total (615)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Walk the graph\n",
        "\n",
        "Given that graphs are nodes and relationships, we can traverse the nodes using those relationships. The graph can be used to show how any two nodes are connected. "
      ],
      "metadata": {
        "id": "gT-lDV5p_AIB"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from IPython.display import HTML\n",
        "\n",
        "def highlight(index, result):\n",
        "  output = f\"{index}. \"\n",
        "  spans = [(token, score, \"#fff59d\" if score > 0.025 else None) for token, score in result[\"tokens\"]]\n",
        "\n",
        "  if result[\"score\"] >= 0.05 and not [color for _, _, color in spans if color]:\n",
        "    mscore = max([score for _, score, _ in spans])\n",
        "    spans = [(token, score, \"#fff59d\" if score == mscore else color) for token, score, color in spans]\n",
        "\n",
        "  for token, _, color in spans:\n",
        "    output += f\"<span style='background-color: {color}'>{token}</span> \" if color else f\"{token} \"\n",
        "\n",
        "  return output\n",
        "\n",
        "def showpath(source, target):\n",
        "  path = graph.showpath(source, target)\n",
        "  path = [graph.attribute(p, \"text\") for p in path]\n",
        "\n",
        "  sections = []\n",
        "  for x, p in enumerate(path):\n",
        "      if x == 0:\n",
        "          # Print start node\n",
        "          sections.append(f\"{x + 1}. {p}\")\n",
        "\n",
        "      if x < len(path) - 1:\n",
        "          # Explain and highlight next path element\n",
        "          results = embeddings.explain(p, [path[x + 1]], limit=1)[0]\n",
        "\n",
        "          sections.append(highlight(x + 2, results))\n",
        "\n",
        "  return HTML(\"<br/><br/>\".join(sections))"
      ],
      "metadata": {
        "id": "sxSKqhPd5AIa"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "showpath(82889, 67364)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 243
        },
        "id": "nd97WRtn54tB",
        "outputId": "12ca85b5-d42f-41f5-f879-b42dbb61206d"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "1. Photo: Famous squirrel, moose go wireless iFone will introduce ring tones, games and screen graphics based on Rocky, the flying squirrel, and his sidekick, Bullwinkle.<br/><br/>2. <span style='background-color: #fff59d'>Squirrel</span> Runs Circles Around Yanks, Tribe (AP) AP - This squirrelly newcomer caused quite a stir at Jacobs Field. A brown squirrel ran onto the field in the bottom of the third inning Wednesday night and ran circles around the New York Yankees and Cleveland Indians. <br/><br/>3. Yankees strike back Gary <span style='background-color: #fff59d'>Sheffield</span> hit a tiebreaking two-run homer in the ninth inning and the Yankees sent the fading <span style='background-color: #fff59d'>Indians</span> to their eighth straight loss, 6-4, last night in <span style='background-color: #fff59d'>Cleveland.</span> <br/><br/>4. <span style='background-color: #fff59d'>Yanks</span> Crush Red Sox Gary <span style='background-color: #fff59d'>Sheffield,</span> Derek Jeter and Jorge Posada each homer off Pedro Martinez, and the Yankees rout the Red Sox 11-1 on Sunday. <br/><br/>5. Yankees Rout Red Sox <span style='background-color: #fff59d'>Yankees'</span> bats pound the Boston Red Sox to the brink of elimination with 22 hits in a 19-8 slaughter on Saturday night, giving New York a commanding 3-0 series advantage. <br/><br/>6. Red Sox Favored Over <span style='background-color: #fff59d'>Yankees</span> in Las Vegas, by Futures Traders The Boston Red Sox, who haven #39;t won baseball #39;s World Series since 1918, are favored to beat the New York Yankees in the American League Championship Series by Las Vegas oddsmakers and futures traders. <br/><br/>7. Red Sox heading to the World Series The Boston Red Sox have <span style='background-color: #fff59d'>won</span> the American League Championship Series and are heading to the World Series for the first time since 1986. "
            ]
          },
          "metadata": {},
          "execution_count": 155
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "This shows how text about a `famous squirrel` and the `Red Sox winning the world series` are connected. Notice how the first match pivots to a node about a squirrel running on the field during a baseball game. From there, it's a relatively logical path to the end node. \n",
        "\n",
        "This is reminiscent of the game \"six degrees of Kevin Bacon\". Try running `showpath` with calls to `random.randint(0, len(rows) - 1)`, it's oddly addicting. This is a fun way to explore the interconnectivity of a dataset."
      ],
      "metadata": {
        "id": "fYK6eF-MiYuQ"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Group images into topics\n",
        "\n",
        "Topic modeling isn't limited to text. It supports any data that can be vectorized into an embeddings index. Next we'll create an embeddings index using the `imagenette` dataset, which is a small dataset for image object detection"
      ],
      "metadata": {
        "id": "XDopj99u-trz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "dataset = load_dataset(\"frgfm/imagenette\", \"160px\", split=\"train\")\n",
        "rows = dataset[\"image\"]\n",
        "\n",
        "# Index with content and objects\n",
        "embeddings = Embeddings({\n",
        "  \"method\": \"sentence-transformers\",\n",
        "  \"path\": \"sentence-transformers/clip-ViT-B-32\",\n",
        "  \"content\": True,\n",
        "  \"objects\": \"image\",\n",
        "  \"functions\": [\n",
        "      {\"name\": \"graph\", \"function\": \"graph.attribute\"},\n",
        "  ],\n",
        "  \"expressions\": [\n",
        "      {\"name\": \"topic\", \"expression\": \"graph(indexid, 'topic')\"},\n",
        "      {\"name\": \"topicrank\", \"expression\": \"graph(indexid, 'topicrank')\"}\n",
        "  ],\n",
        "  \"graph\": {\n",
        "      \"limit\": 15,\n",
        "      \"minscore\": 0.1,\n",
        "      \"topics\": {\n",
        "          \"resolution\": 1.0\n",
        "      }\n",
        "  }\n",
        "})\n",
        "\n",
        "embeddings.index((x, image, None) for x, image in enumerate(rows))"
      ],
      "metadata": {
        "id": "74pjo5zx92k7"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "graph = embeddings.graph\n",
        "list(graph.topics.keys())[:5]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NpQyTDSUMHIR",
        "outputId": "ea41a841-b5ce-4846-c183-384c7414a0af"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['topic_0', 'topic_1', 'topic_2', 'topic_3', 'topic_4']"
            ]
          },
          "metadata": {},
          "execution_count": 15
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Topic labeling for images\n",
        "\n",
        "The index is now ready. Notice how the topic names are generic. Given there is no text associated, a different approach is needed. We'll use an object detection pipeline to label to best matching image per topic."
      ],
      "metadata": {
        "id": "RgA9FSNPmlaC"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import ipyplot\n",
        "\n",
        "from PIL import Image\n",
        "\n",
        "from txtai.pipeline import Objects\n",
        "\n",
        "def labels():\n",
        "  objects = Objects(classification=True, threshold=0.25)\n",
        "\n",
        "  images = embeddings.search(\"select topic, object from txtai where topicrank=0 order by topic\", 100)\n",
        "  results = objects([result[\"object\"] for result in images], flatten=True)\n",
        "\n",
        "  return {images[\"topic\"]: results[x][0].split(\",\")[0] for x, images in enumerate(images)}\n",
        "\n",
        "def scale(image, factor=1):\n",
        "  width, height = image.size\n",
        "  return image.resize((int(width / factor), int((width / factor))))\n",
        "\n",
        "images, labels = {}, labels()\n",
        "\n",
        "for topic in list(graph.topics.keys())[:5]:\n",
        "  for result in embeddings.search(f\"select topic, object from txtai where topic = '{topic}' and topicrank = 0\", len(graph.topics)):\n",
        "    images[topic] = scale(result[\"object\"])\n",
        "\n",
        "ipyplot.plot_images(list(images.values()), [labels[topic] for topic in images], img_width=150)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 269
        },
        "id": "dNzenG45On_R",
        "outputId": "d0c8a1d4-13f7-4978-ed2d-4dc3f107fe8d"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "No model was supplied, defaulted to google/vit-base-patch16-224 and revision 5dca96d (https://huggingface.co/google/vit-base-patch16-224).\n",
            "Using a pipeline without specifying a model name and revision in production is not recommended.\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "\n",
              "    <style>\n",
              "        #ipyplot-html-viewer-toggle-aBG8jjRHLXuXzgmTbH2Lm8 {\n",
              "            position: absolute;\n",
              "            top: -9999px;\n",
              "            left: -9999px;\n",
              "            visibility: hidden;\n",
              "        }\n",
              "\n",
              "        #ipyplot-html-viewer-label-aBG8jjRHLXuXzgmTbH2Lm8 { \n",
              "            position: relative;\n",
              "            display: inline-block;\n",
              "            cursor: pointer;\n",
              "            color: blue;\n",
              "            text-decoration: underline;\n",
              "        }\n",
              "\n",
              "        #ipyplot-html-viewer-textarea-aBG8jjRHLXuXzgmTbH2Lm8 {\n",
              "            background: lightgrey;\n",
              "            width: 100%;\n",
              "            height: 0px;\n",
              "            display: none;\n",
              "        }\n",
              "\n",
              "        #ipyplot-html-viewer-toggle-aBG8jjRHLXuXzgmTbH2Lm8:checked ~ #ipyplot-html-viewer-textarea-aBG8jjRHLXuXzgmTbH2Lm8 {\n",
              "            height: 200px;\n",
              "            display: block;\n",
              "        }\n",
              "\n",
              "        #ipyplot-html-viewer-toggle-aBG8jjRHLXuXzgmTbH2Lm8:checked + #ipyplot-html-viewer-label-aBG8jjRHLXuXzgmTbH2Lm8:after {\n",
              "            content: \"hide html\";\n",
              "            position: absolute;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "            right: 0;\n",
              "            bottom: 0;\n",
              "            background: white;\n",
              "            cursor: pointer;\n",
              "            color: blue;\n",
              "            text-decoration: underline;\n",
              "        }\n",
              "    </style>\n",
              "    <div>\n",
              "        <input type=\"checkbox\" id=\"ipyplot-html-viewer-toggle-aBG8jjRHLXuXzgmTbH2Lm8\">\n",
              "        <label id=\"ipyplot-html-viewer-label-aBG8jjRHLXuXzgmTbH2Lm8\" for=\"ipyplot-html-viewer-toggle-aBG8jjRHLXuXzgmTbH2Lm8\">show html</label>\n",
              "        <textarea id=\"ipyplot-html-viewer-textarea-aBG8jjRHLXuXzgmTbH2Lm8\" readonly>\n",
              "            \n",
              "        <style>\n",
              "        #ipyplot-imgs-container-div-LUpNqSJKt5pp7edbiz6qJm {\n",
              "            width: 100%;\n",
              "            height: 100%;\n",
              "            margin: 0%;\n",
              "            overflow: auto;\n",
              "            position: relative;\n",
              "            overflow-y: scroll;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm {\n",
              "            width: 150px;\n",
              "            display: inline-block;\n",
              "            margin: 3px;\n",
              "            position: relative;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm {\n",
              "            width: 150px;\n",
              "            background: white;\n",
              "            display: inline-block;\n",
              "            vertical-align: top;\n",
              "            text-align: center;\n",
              "            position: relative;\n",
              "            border: 2px solid #ddd;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm span.ipyplot-img-close {\n",
              "            display: none;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm span {\n",
              "            width: 100%;\n",
              "            height: 100%;\n",
              "            position: absolute;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm img {\n",
              "            width: 150px;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm span.ipyplot-img-close:hover {\n",
              "            cursor: zoom-out;\n",
              "        }\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm span.ipyplot-img-expand:hover {\n",
              "            cursor: zoom-in;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm]:target {\n",
              "            transform: scale(2.5);\n",
              "            transform-origin: left top;\n",
              "            z-index: 5000;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "            position: absolute;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm]:target span.ipyplot-img-close {\n",
              "            display: block;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm]:target span.ipyplot-img-expand {\n",
              "            display: none;\n",
              "        }\n",
              "        </style>\n",
              "    <div id=\"ipyplot-imgs-container-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-Q4z3b6fAaMLE5GpNyeArrT\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">cassette player</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-Q4z3b6fAaMLE5GpNyeArrT\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-bXsQvo3kRs25VXszP9zPm8\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">garbage truck</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-bXsQvo3kRs25VXszP9zPm8\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-U39EwtYRFdLcWee4WD2WAq\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">golf ball</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-U39EwtYRFdLcWee4WD2WAq\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-BMPDdxQLT2qCR6FkWYYBbJ\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">English springer</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-BMPDdxQLT2qCR6FkWYYBbJ\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-4scenSDGKpdibX3KwW7nQJ\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">gas pump</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-4scenSDGKpdibX3KwW7nQJ\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    </div>\n",
              "        </textarea>\n",
              "    </div>\n",
              "    "
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "\n",
              "        <style>\n",
              "        #ipyplot-imgs-container-div-LUpNqSJKt5pp7edbiz6qJm {\n",
              "            width: 100%;\n",
              "            height: 100%;\n",
              "            margin: 0%;\n",
              "            overflow: auto;\n",
              "            position: relative;\n",
              "            overflow-y: scroll;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm {\n",
              "            width: 150px;\n",
              "            display: inline-block;\n",
              "            margin: 3px;\n",
              "            position: relative;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm {\n",
              "            width: 150px;\n",
              "            background: white;\n",
              "            display: inline-block;\n",
              "            vertical-align: top;\n",
              "            text-align: center;\n",
              "            position: relative;\n",
              "            border: 2px solid #ddd;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm span.ipyplot-img-close {\n",
              "            display: none;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm span {\n",
              "            width: 100%;\n",
              "            height: 100%;\n",
              "            position: absolute;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm img {\n",
              "            width: 150px;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm span.ipyplot-img-close:hover {\n",
              "            cursor: zoom-out;\n",
              "        }\n",
              "        div.ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm span.ipyplot-img-expand:hover {\n",
              "            cursor: zoom-in;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm]:target {\n",
              "            transform: scale(2.5);\n",
              "            transform-origin: left top;\n",
              "            z-index: 5000;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "            position: absolute;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm]:target span.ipyplot-img-close {\n",
              "            display: block;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm]:target span.ipyplot-img-expand {\n",
              "            display: none;\n",
              "        }\n",
              "        </style>\n",
              "    <div id=\"ipyplot-imgs-container-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-Q4z3b6fAaMLE5GpNyeArrT\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">cassette player</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-Q4z3b6fAaMLE5GpNyeArrT\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-bXsQvo3kRs25VXszP9zPm8\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">garbage truck</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-bXsQvo3kRs25VXszP9zPm8\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-U39EwtYRFdLcWee4WD2WAq\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">golf ball</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-U39EwtYRFdLcWee4WD2WAq\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-BMPDdxQLT2qCR6FkWYYBbJ\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">English springer</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-BMPDdxQLT2qCR6FkWYYBbJ\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "        <div id=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-4scenSDGKpdibX3KwW7nQJ\" class=\"ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">gas pump</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-LUpNqSJKt5pp7edbiz6qJm-4scenSDGKpdibX3KwW7nQJ\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    </div>"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "As we can see, each topic is now labeled. Imagenette is a labeled dataset, let's evaluate the accuracy of our topic modeling."
      ],
      "metadata": {
        "id": "0QbcQDqQkUhz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "def accuracy():\n",
        "  correct, total = 0, 0\n",
        "  labels = dataset[\"label\"]\n",
        "\n",
        "  for topic in graph.topics:\n",
        "      label = labels[int(graph.topics[topic][0])]\n",
        "      correct += sum(1 if labels[int(x)] == label else 0 for x in graph.topics[topic])\n",
        "      total += len(graph.topics[topic])\n",
        "\n",
        "  print(\"Accuracy:\", correct/total)\n",
        "\n",
        "accuracy()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "QtbxK1OWSFFV",
        "outputId": "c7a47681-d90c-42f6-df8f-f068236ffea0"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Accuracy: 0.9747597423170345\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Not bad, 97.48% accuracy using a totally unsupervised method not even intended for image classification!"
      ],
      "metadata": {
        "id": "Wspya3xpkkE4"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Walk the image graph\n",
        "\n",
        "As we did before, let's walk the graph. We'll start with two images, `a person parachuting from the sky` and `someone holding a french horn`."
      ],
      "metadata": {
        "id": "4gfxy-OLmwRt"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "images = []\n",
        "for uid in graph.showpath(4352, 9111):\n",
        "  images.append(scale(embeddings.search(f\"select object from txtai where indexid = {uid} limit 1\")[0][\"object\"]))\n",
        "\n",
        "ipyplot.plot_images(images, img_width=150)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 233
        },
        "id": "xYNGfO7dlCze",
        "outputId": "070e9173-2725-4fd7-cbd0-11245b1a1966"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "\n",
              "    <style>\n",
              "        #ipyplot-html-viewer-toggle-e8zE4bvKjP6WSMmXFch2iG {\n",
              "            position: absolute;\n",
              "            top: -9999px;\n",
              "            left: -9999px;\n",
              "            visibility: hidden;\n",
              "        }\n",
              "\n",
              "        #ipyplot-html-viewer-label-e8zE4bvKjP6WSMmXFch2iG { \n",
              "            position: relative;\n",
              "            display: inline-block;\n",
              "            cursor: pointer;\n",
              "            color: blue;\n",
              "            text-decoration: underline;\n",
              "        }\n",
              "\n",
              "        #ipyplot-html-viewer-textarea-e8zE4bvKjP6WSMmXFch2iG {\n",
              "            background: lightgrey;\n",
              "            width: 100%;\n",
              "            height: 0px;\n",
              "            display: none;\n",
              "        }\n",
              "\n",
              "        #ipyplot-html-viewer-toggle-e8zE4bvKjP6WSMmXFch2iG:checked ~ #ipyplot-html-viewer-textarea-e8zE4bvKjP6WSMmXFch2iG {\n",
              "            height: 200px;\n",
              "            display: block;\n",
              "        }\n",
              "\n",
              "        #ipyplot-html-viewer-toggle-e8zE4bvKjP6WSMmXFch2iG:checked + #ipyplot-html-viewer-label-e8zE4bvKjP6WSMmXFch2iG:after {\n",
              "            content: \"hide html\";\n",
              "            position: absolute;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "            right: 0;\n",
              "            bottom: 0;\n",
              "            background: white;\n",
              "            cursor: pointer;\n",
              "            color: blue;\n",
              "            text-decoration: underline;\n",
              "        }\n",
              "    </style>\n",
              "    <div>\n",
              "        <input type=\"checkbox\" id=\"ipyplot-html-viewer-toggle-e8zE4bvKjP6WSMmXFch2iG\">\n",
              "        <label id=\"ipyplot-html-viewer-label-e8zE4bvKjP6WSMmXFch2iG\" for=\"ipyplot-html-viewer-toggle-e8zE4bvKjP6WSMmXFch2iG\">show html</label>\n",
              "        <textarea id=\"ipyplot-html-viewer-textarea-e8zE4bvKjP6WSMmXFch2iG\" readonly>\n",
              "            \n",
              "        <style>\n",
              "        #ipyplot-imgs-container-div-dAYw7LvRAh2tnAzBrsEmeU {\n",
              "            width: 100%;\n",
              "            height: 100%;\n",
              "            margin: 0%;\n",
              "            overflow: auto;\n",
              "            position: relative;\n",
              "            overflow-y: scroll;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU {\n",
              "            width: 150px;\n",
              "            display: inline-block;\n",
              "            margin: 3px;\n",
              "            position: relative;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU {\n",
              "            width: 150px;\n",
              "            background: white;\n",
              "            display: inline-block;\n",
              "            vertical-align: top;\n",
              "            text-align: center;\n",
              "            position: relative;\n",
              "            border: 2px solid #ddd;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU span.ipyplot-img-close {\n",
              "            display: none;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU span {\n",
              "            width: 100%;\n",
              "            height: 100%;\n",
              "            position: absolute;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU img {\n",
              "            width: 150px;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU span.ipyplot-img-close:hover {\n",
              "            cursor: zoom-out;\n",
              "        }\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU span.ipyplot-img-expand:hover {\n",
              "            cursor: zoom-in;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU]:target {\n",
              "            transform: scale(2.5);\n",
              "            transform-origin: left top;\n",
              "            z-index: 5000;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "            position: absolute;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU]:target span.ipyplot-img-close {\n",
              "            display: block;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU]:target span.ipyplot-img-expand {\n",
              "            display: none;\n",
              "        }\n",
              "        </style>\n",
              "    <div id=\"ipyplot-imgs-container-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-9fZzerDjemxdDJkhMt88Uo\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">0</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-9fZzerDjemxdDJkhMt88Uo\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-EVPPNu7kwm4dmJweyYQ9hX\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">1</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-EVPPNu7kwm4dmJweyYQ9hX\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-kUxC6yATSz9HGsVZB2L2rv\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">2</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-kUxC6yATSz9HGsVZB2L2rv\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-f6wpvrBPVwbd5CAx6Xt6KA\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">3</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-f6wpvrBPVwbd5CAx6Xt6KA\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-bSnqGQL9CuSH22EHPxnWg4\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">4</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-bSnqGQL9CuSH22EHPxnWg4\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    </div>\n",
              "        </textarea>\n",
              "    </div>\n",
              "    "
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "\n",
              "        <style>\n",
              "        #ipyplot-imgs-container-div-dAYw7LvRAh2tnAzBrsEmeU {\n",
              "            width: 100%;\n",
              "            height: 100%;\n",
              "            margin: 0%;\n",
              "            overflow: auto;\n",
              "            position: relative;\n",
              "            overflow-y: scroll;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU {\n",
              "            width: 150px;\n",
              "            display: inline-block;\n",
              "            margin: 3px;\n",
              "            position: relative;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU {\n",
              "            width: 150px;\n",
              "            background: white;\n",
              "            display: inline-block;\n",
              "            vertical-align: top;\n",
              "            text-align: center;\n",
              "            position: relative;\n",
              "            border: 2px solid #ddd;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU span.ipyplot-img-close {\n",
              "            display: none;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU span {\n",
              "            width: 100%;\n",
              "            height: 100%;\n",
              "            position: absolute;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU img {\n",
              "            width: 150px;\n",
              "        }\n",
              "\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU span.ipyplot-img-close:hover {\n",
              "            cursor: zoom-out;\n",
              "        }\n",
              "        div.ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU span.ipyplot-img-expand:hover {\n",
              "            cursor: zoom-in;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU]:target {\n",
              "            transform: scale(2.5);\n",
              "            transform-origin: left top;\n",
              "            z-index: 5000;\n",
              "            top: 0;\n",
              "            left: 0;\n",
              "            position: absolute;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU]:target span.ipyplot-img-close {\n",
              "            display: block;\n",
              "        }\n",
              "\n",
              "        div[id^=ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU]:target span.ipyplot-img-expand {\n",
              "            display: none;\n",
              "        }\n",
              "        </style>\n",
              "    <div id=\"ipyplot-imgs-container-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-9fZzerDjemxdDJkhMt88Uo\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">0</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-9fZzerDjemxdDJkhMt88Uo\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-EVPPNu7kwm4dmJweyYQ9hX\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">1</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-EVPPNu7kwm4dmJweyYQ9hX\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-kUxC6yATSz9HGsVZB2L2rv\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">2</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-kUxC6yATSz9HGsVZB2L2rv\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-f6wpvrBPVwbd5CAx6Xt6KA\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">3</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-f6wpvrBPVwbd5CAx6Xt6KA\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    \n",
              "    <div class=\"ipyplot-placeholder-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "        <div id=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-bSnqGQL9CuSH22EHPxnWg4\" class=\"ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU\">\n",
              "            <h4 style=\"font-size: 12px; word-wrap: break-word;\">4</h4>\n",
              "            <img src=\"\"/>\n",
              "            <a href=\"#!\">\n",
              "                <span class=\"ipyplot-img-close\"/>\n",
              "            </a>\n",
              "            <a href=\"#ipyplot-content-div-dAYw7LvRAh2tnAzBrsEmeU-bSnqGQL9CuSH22EHPxnWg4\">\n",
              "                <span class=\"ipyplot-img-expand\"/>\n",
              "            </a>\n",
              "        </div>\n",
              "    </div>\n",
              "    </div>"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Very interesting! The first match is a person parachuting onto a football field, followed by a matching band on a field, finally leading to a person holding a french horn."
      ],
      "metadata": {
        "id": "xSIKYvl_nEYi"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Wrapping up\n",
        "\n",
        "This notebook covered quite a lot! We introduced graphs, showed how they can be used to model semantic relationships and topics. This change makes it easier to run exploratory data analysis on a dataset with txtai and quickly gain insights. \n",
        "\n",
        "This is just the beginning of what is possible and there are a wide range of exciting new possibilities for txtai, stay tuned!"
      ],
      "metadata": {
        "id": "8qgNudA_nXd7"
      }
    }
  ]
}
